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We review the recent progress in the investigation of powerfree words, with particular 
emphasis on binary cubefree and ternary squarefree words. Besides various bounds 
on the entropy, we provide bounds on letter frequencies and consider their empirical 
distribution obtained by an enumeration of binary cubefree words up to length 80. 



1 Introduction 

The interest in combinatorics on words goes back to the work of Axel Thue at the beginning of 
the 20th century [1]. He showed, in particular, that the famous morphism 
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called Thue-Morse morphism since the work of Morse [2] , is cubefree. Its iteration on the initial 
word produces an infinite cubefree word 

0110100110010110100101100110100110010110011010010110100110010110 . . . 

over a binary alphabet, which means that it does not contain any subword of the form O'^ = 000, 
1^ = 111, (01)3 ^ 010101, (10)3 = 101010 and so forth. Moreover, the statement that the 
morphism is cubefree means that it maps any cubefree word to a cubefree word, so it preserves 
this property. Generally, the iteration of a powerfree morphism is a convenient way to produce 
infinite powerfree words. 

The investigation of powerfree, or more generally of pattern-avoiding words, is one particular 
aspect of combinatorics on words; we refer the reader to the book series 13] Hill] for a comprehensive 
overview of the area, including algebraic formulations and applications. The area has attracted 
considerable activity in the past decades [i[a[i[i[T0l[TTl[Ti[l3[ll[ig[16l[I3|TS[li[2^ 
[221 [231 EH, and continues to do so, see [25l [26l [27l [28l [29l [30l [SH [32l [33] for some recent work. 
Beyond the realm of combinatorics on words and coding theory, substitution sequences, such as 
the Thue-Morse sequence, have been investigated for instance in the context of symbolic dynamics 
[331[3Sl[3n] and aperiodic order [37], to name but two. In the latter case, one is interested in systems 
which display order without periodicity, and substitution sequences often provide paradigmatic 
models, which are used in many applications in physics and materials science. However, sequences 
produced by a substitution such as in Eq. ([1]) have subexponential complexity and hence zero 
combinatorial entropy, cf. Definition [T2| below. A natural generalisation to interesting sets of 
positive entropy is provided by powerfree or pattern-avoiding words. 

In this article, we review the recent progress on powerfree words, with emphasis on the two 
'classic' cases of binary cubefree and ternary squarefree words. We include a summary of relevant 
results which are scattered over 25 years of literature, and also discuss some new results as well 
as conjectures on cubefree morphisms and letter frequencies in binary cubefree words. 

The first term of interest is the combinatorial entropy of the set of powerfree words. Due to 
the fact that every subword of a powerfree word is again powerfree, the entropy of powerfree words 
exists as a limit. It is a measure for the exponential growth rate of the number of powerfree words 
of length n. Unfortunately, neither an explicit expression for the entropy of fc-powerfree words 



nor an easy way to compute it numerically is known. Nevertheless, there are several strategies 
to derive upper and lower bounds for this limit. Upper bounds can be obtained, for example, 
by enumeration of all powerfree words up to a certain length, or by the derivation of generating 
functions for the number of powerfree words, see Section 4. Until recently all methods to achieve 
lower bounds relied on powerfree morphisms. However, the lower bounds obtained in this way 
are not particularly good, since they are considerably smaller than the upper bounds as well as 
reliable numerical estimates of the actual value of the entropy. A completely different approach 
introduced recently by Kolpakov [29], which amounts to choosing a parameter value to satisfy a 
number of inequalities derived from a Perron- Frobenius- type argument, provides surprisingly good 
lower bounds for the entropy of ternary squarefree and binary cubefree words. 

In the following section, we briefly introduce the notation and basic terminology; see [3] for a 
more detailed introduction. We continue with a summary of results on fc-powerfree morphisms, 
which can be used to derive lower bounds for the corresponding entropy. We then proceed by 
introducing the entropy of /c-powerfree words and summarise the methods to derive upper and 
lower bounds in general, and for binary cubefree and ternary squarefree words in particular. We 
conclude with a discussion of the frequencies of letters in binary cubefree and ternary squarefree 
words. 

2 Powerfree words and morphisms 

Define an alphabet A as a finite non-empty set of symbols called letters. The cardinality of A is 
denoted by Card(yl). Finite or infinite sequences of elements from A are called words. The empty 
word is denoted by e. The set of all finite words, the operation of concatenation of words and the 
empty word e form the free monoid A* . The free semigroup generated by A is A'^ := ^ \ {s}. 

The length of a word u G A*, denoted by \u\, is the number of letters that u consists of. The 
length of the empty word is |e| :— 0. 

For two words u,v £ A* , we say that w is a subword or a factor of u if there are words x,y Cz A* 
such that u = xvy. If x = e, the factor v is called a prefix of u, and ii y = e, v is called a suffix of 
u. Given a set of words X d A* (here and in what follows, the symbol C is meant to include the 
possibility that both sets are equal), the set of all factors of words in X is denoted by Fact(X). 

A map q: A* B*, where A and B are alphabets, is called a morphism if g{uv) = g{u)g{v) 
holds for all u,v £ A* . Obviously, a morphism g is completely determined by g{a) for all a £ A, 
and satisfies g{e) — e. K morphism g: A* — > B* is called ^-uniform, if \g{a)\ = £ for all a €z A. 

For a word u, we define := e, := u and, for an integer fc > 1, the power u'' as the 
concatenation of k occurrences of the word u. li u ^ e, u'' is called a k-power. A word v contains 
a fc-power if at least one of its factors is a fc-power. If a word does not contain any fc-power as 
a factor, it is called k-powerfree. If a word does not contain the fc-power of any word up to a 
certain length p as a factor, it is called length-p k-powerfree, i.e., w — xu^y implies that u ~ e 
whenever x,u,y € A* with |m| ^ p. We denote the set of /c-powerfree words in an alphabet A by 
F'^'^^A) C a* and the set of length-p fc-powerfree by F^'''P'>{A) C A* . By definition, the empty 
word £ is fc-powerfree for all fc. A word w e A* is called primitive, if w = w", with v e A* and 
n g N, implies that n — I, meaning that w is not a proper power of another word v. 

A morphism g : A* B* is called k-powerfree, if g{u) is fc-powerfree for every fc-powerfree 
word u. In other words, g is powerfree if giyP'^^^A)) C F^''\B). A test-set for fc-powerfreeness 
of morphisms on an alphabet A is a set T C A* such that, for any morphism g: A* —^B*, g is 
fc-powcrfrce if and only if g(T) is fc-powerfree. A morphism is called powerfree if it is a fc-powerfree 
morphism for every fc ^ 2. 

In particular, 2-powerfree and 3-powerfree words and morphisms are called squarefree and 
cubefree, respectively. A morphism from A* to B* with Card (A) = 2 is also called a binary 
morphism. The notion of powerfreeness can be extended to non-integer powers; see, for instance, 
Ref. for an investigation of fc-powerfree binary words for fc ^ 2. However, in this article we 
shall concentrate on the cases fc = 2 and fc = 3, and hence restrict the discussion to integer powers. 
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3 Characterisations of fc-powerfree morphisms 



In what follows, wc summarise a number of relevant results on /c-powerfree morphisms. In partic- 
ular, we are interested in the question how to test a specified morphism for /c-powerfreeness. We 
start with results relating to the case k — 2. 

3.1 Characterisations of squarefree morphisms 

A sufficient (but in general not necessary) condition for the squarefreeness of a morphism is known 
since 1979. 

Theorem 1 (Bean et al. [5]). A morphism g: A* — > B* is squarefree if 

(i) g{w) is squarefree for every squarefree word w G A* of length \w\ ^ 3; 

(ii) a = b whenever a,b €z A and g{a) is a factor of g{b). 

If the morphism g is uniform, this condition is in fact also necessary, because in this case g{a) 
being a factor of g{b) implies that g{a) = g{b). If a,b € A exist with a ^b and g{a) = g{b), then 
clearly g is not squarefree since g{ab) = g{a)g(b) is a square. This gives the following corollary. 

Corollary 2. A uniform morphism g: A* — > B* is squarefree if and only if g{w) is squarefree for 
every squarefree word w e A* of length \w\ ^ 3. 

This corollary corresponds to Brandenburg's Theorem 2 in Ref. [11] which only demands that 
g{w) is squarefree for every squarefree word w G A* of length exactly 3. A short calculation 
reveals that this condition is equivalent to (i), because every squarefree word of length smaller 
than 3 occurs as a factor of a squarefree word of length 3. 

For the next characterisation, we need the notion of a pre-square with respect to a morphism 
g. Let A be an alphabet, w G A* a. squarefree word and g: A* B* a morphism. A factor u ^ e 
of g{w) — au(3 is called a pre-square with respect to g, if there exists a word w' € A* satisfying: 
ww' is squarefree and w is a prefix of f3g{w') or w'w is squarefree and u is a suffix of g{w')a. 
Obviously, if w is a pre-square, then either g(ww') or g(w'w) contains as a factor. 

Theorem 3 (Crochemore [5]). A morphism g: A* — > B* is squarefree if and only if 

(i) g{w) is squarefree for every squarefree word w G A* of length \w\ ^ 3; 

(ii) for any a Cz A, g{a) does not have any internal pre-squares. 

It follows that, for a ternary alphabet A, a finite test-set exists, as specified in the following 
corollary. However, the subsequent theorem shows that, as soon as we consider an alphabet 
with Card(yl) > 3, no such finite test-sets exist, so the situation becomes more complex when 
considering larger alphabets. 

Corollary 4 (Crochemore [9,). Let Card(^) = 3. ^4 morphism g: A* B* is squarefree if and 
only if g{w) is squarefree for every squarefree word w G A* of length \w\ ^ 5. 

Theorem 5 (Crochemore [9]). Let Card(y^) > 3. For any integer n, there exists a morphism 
g : A* B* which is not squarefree, but maps all squarefree words of length up to n on squarefree 
words. 

3.2 Characterisations of cubefree and A;-powerfree morphisms 

We now move on to characterisations of cubefree and fc-powerfrec morphisms for fc > 3. We start 
with a recent result on cubefree binary morphisms. 

Theorem 6 (Richomme, Wlazinski 23 ). A set T C {a,b}* is a test- set for cubefree morphisms 
from A* — {a, 6}* to B* with Card(S) ^ 2 if and only if T is cubefree and Fact(r) D T,„i,i, where 

Tniin {abbabba, baabaab, ababba, babaab, abbaba, baabab, aabba, bbaab, abbaa, baabb, ababa, babab}. 
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Obviously, the set T^i^ itself is a test-set for cubefree binary morphisms. Another test-set is 
the set of cubefree words of length 7, as each word of Tmin appears as a factor of this set. There 
are even single words which contain all the elements of Tmin as factors. For instance, the cubefree 
word aabbababbabbaabaababaabb is one of the 56 words of length 24 which are test-sets for cubefree 
morphisms on {a, b}. The length of this word is optimal: no cube-free word of length 23 contains 
all the words of Tmin as factors. 

The following sufficient characterisation of fc-powerfree morphisms generalises Theorem [1] to 
integer powers k > 2. 

Theorem 7 (Bean et al. [S]). Let g : A* B* be a morphism for alphabets A and B and let 
k > 2. Then g is k-powerfree if 

(i) giw) is k-powerfree whenever w £ A* is k-powerfree and of length \w\ ^ k + 1; 

(ii) a — b whenever a^b ^ A with g{a) a factor of g{b); 

(iii) the equality xg{a)y = g{b)g{c), with a,b,c A and x,y £ B* , implies that either x = e, 
a ~ b or y = e, a = c. 

As in the squarefree case above, a uniform morphism g for which (i) holds also meets (ii), 
because uniformity implies that g{a) = g{b). lia ^ b, the word a^^^b is fc-powerfree but g{a'^^^b) — 
g{a)^ is a fc-power, which produces a contradiction. The condition (iii) means that, for all letters 
a £ A, the images g{a) do not occur as an inner factor of g{bc) for any b,c G A. In general, 
this is not necessary for uniform morphisms; an example is given by the Thue-Morse morphism 
g of Eq. ll]). For instance, g{00) — 0101 = Qg{l)l, which violates condition (iii) in Theorem [71 
Nevertheless, the Thue-Morse morphism is cubefree [T]. 

Alphabets with Card(S) < 2 only provide trivial results, because the only fc-powerfree mor- 
phism from A* to {s}* is the empty morphism e, and for Card(yB) = 1 the only additional 
morphism is the map for Card(yl) = 1 that maps the single element in A to the single letter in B. 
From now on, we consider alphabets with Card(S) ^ 2. First, we deal with the case Card(^) ^ 3. 

Theorem 8 (Richomme, Wlazinski 23J). Given two alphabets A and B such that Card{A) ^ 3 
and Card(S) ^ 2, and given any integer k ^ 3, there is no finite test-set for k-powerfree morphisms 
from A* toB*. 

This again is a negative result, which shows that the general situation is difficult to handle. In 
general, no finite set of words suffices to verify the fc-powerfreeness of a morphism. The situation 
improves if we restrict ourselves to uniform morphisms, and look for test-sets for this restricted 
class of morphisms only. Here, a test- set for k-powerfreeness of uniform morphisms on A* is a set 
T C .4* such that, for every uniform morphisms g on A*, g is fc-powerfree if and only if g{T) is 
fc-powerfree. 

The existence of finite test-sets of this type was recently established by Richomme and Wlazin- 
ski [28]. Let Card(^) ^ 2 and fc ^ 3 be an integer. Define 

T'^'^HA) c/C^H-^) U (F^'^^A) n V'^''\A)) 

where U'^'^^A) is the set of fc-powerfree words over A of length at most fc + 1, and V'-'^^A) 
is the set of words over A that can be written in the form a()WiaiW2 ■ ■ . ak~iWkak with letters 
flo, fli, . . . , flfc S A and words wi,W2 ■ ■ ■ Wk G A* which contain every letter of A at most once and 
satisfy — ^ 1. Obviously, this set is finite and comprises words with a maximum length 
of 

max{|ii;| | w € rW(yt)} <fc(Card(y^|) + l) + l. 

Theorem 9 (Richomme, Wlazinski [28 ). Let Card(^) ^ 2 and k ^ 3 be an integer. The finite 
set T(''"'(yl) is a test-set for k-powerfreeness of uniform morphisms on A* . 

Due to the upper bound on the maximum length of words in r'^'"'^(^), the following corollary 
is immediate. 
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Corollary 10 (Richomme, Wlazinski [28j). A uniform morphism g on A* is k-powerfree for an 
integer power k ^ 3 if and only if g{w) is k-powerfree for all k-powerfree words w of length at 
most k{Card{A) + l) + 1. 

Although this result provides an explicit test-set for /c-powerfreeness, it is of limited practical 
use, simply because the test-set becomes large very quickly. Already for Card(^) = 4 and A: = 3, 
the set r(3)(^) has 26247020 elements. For comparison, the set of cubefree words in four letters 
of length 16, as required in CoroUarv fTOl has 1939267560 elements, so is still much larger. 

Finally, let us quote the following result of Keranen 38J , which characterises fc-powerfree binary 
morphisms and indicates that the test-set of Theorem ^ is far from optimal. 

Theorem 11 (Keranen |38j). Let g: {a,b} —^B* be a uniform morphism with g{a) ^ g{b) and 
primitive words g{a), g(b) and g{ab). For every k-powerfree word w G {a, 5}*, g{w) is k-powerfree 
if and only if g{v) is k-powerfree for every subword v of w with 




4 /or 3 ^ fc ^ 6; 

|(fc + l) fork ^7. 



4 Entropy of power free words 

Let A be an alphabet. A subset AT C -4* is called factorial if for any word x £ X all factors of 
X are also contained in X. Define for a factorial subset X C A* the number of words of length 
n occurring in X by c^{n). This number gives some idea of the complexity of X: the larger the 
number of words of length n, the more diverse or complicated is the set. That is why Cj,f : N ^ N 
is called the complexity function of X. 

Definition 12. The (combinatorial) entropy of an infinite factorial set X C A* is defined by 

h(X) ~ lim —logCx{n). 

n — *oo Tl 

The requirement that X is factorial ensures the existence of the limit, see for example \3!A 
Lemma 1]. 

We note the following: 

(i) liX ClA* with C&yA{A) = r, then 1 Cx(n) ^ r" for all n which implies < h{X) logr. 

(ii) liX = A* with Card(yt) r, then Cx{n) r" and h{X) ^ logr. 

The set of fc-powerfree words F^'^^A) over an alphabet A is obviously a factorial subset of A*, 
which is infinite for suitable values of k for a given alphabet A. The precise value of the corre- 
sponding entropy, which coincides with the topological entropy [40j , is not known, but lower and 
upper bounds exist for many cases. Recently, much improved upper and lower bounds have been 
established for h{F^'^\{0,l,2})) and /i(F(3)({0, 1})), which will be outlined below. Generally it 
is easier to find upper bounds than to give lower bounds, due to the factorial nature of the set of 
fc-powerfree words, so we start with describing several methods to produce upper bounds on the 
entropy. 

4.1 Upper bounds for the entropy 

A simple way to provide upper bounds is based on the enumeration of the set of fc-powerfree 
words up to some length. Clearly, for the case of r = Card(yl) letters, the number of words 
c{n) :— c^(fc)^^^(n) is bounded by r", so the corresponding entropy is h := h(^F^''^A)) < logr, as 
mentioned above. Suppose we know the actual value of c(n) for some fixed n. Then, due to the 
factorial nature of the set F^'''>{A), 

cirnn) ^ c(n)™ 
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for any to ^ 1. Hence 

h= lim cjmn) ^ log c{n)^ 
m^oo mn n 

which, for any n, yields an upper bound for h. Obviously, the larger the value of n, the better the 
bound obtained in this way. In some cases, the bound can be slightly improved by considering 
words that overlap in a couple of letters; see [39 for an example. 

Sharper upper bounds can be produced by following a different approach, namely by consider- 
ing a set of words that do not contain /c-powers of a fixed finite set of words, for instance /c-powers 
of all words up to a given length. This limitation means that the number of forbidden words is 
finite, and that the resulting factorial set has a larger entropy than the set of fc-powerfree words, 
so the latter provides an upper bound. Again, by increasing the number of forbidden words, the 
bounds can be systematically improved. 

As Noonan and Zeilberger pointed out [41], it is possible to calculate the generating fmiction 
for the numbers of words avoiding a finite set of forbidden words by solving a system of linear 
equations. The generating functions are rational functions, and the location of the pole closest 
to the origin determines the radius of convergence, and hence the entropy of the corresponding 
set of words. This approach has been applied in Ref. [26' to derive an upper bound for the set of 
squarefree words in three letters, and generating functions for cubefree words in two letters are 
discussed below. 

A related, though computationally easier approach is based on a Perron-Frobenius argument. 
It is sometimes referred to as the 'transfer matrix' or the 'cluster' approach. Here, a matrix is 
constructed, which determines how /c-powerfree words of a given length can be concatenated to 
form /c-powcrfree words, and the growth rate is then determined by the maximum eigenvalue of 
this matrix. Both methods yield upper bounds that can be improved by increasing the length 
of the words involved, and in principle can approximate the entropy arbitrarily well, though in 
practice this is limited by the computational problem of computing the leading eigenvalue of a 
large matrix, or solving a large system of linear equations; see, for instance, ^3 ^'^^ details. 



4.2 Lower bounds for the entropy 

Until very recently, all methods used to prove that the entropy of fc-powerfree words is positive 
and to establish lower bounds on the entropy were based on fc-powerfree morphisms. Clearly, a 
fc-powerfree morphism, iterated on a single letter, produces fc-powerfree words of increasing length 
and suffices to show the existence of infinite fc-powerfree words. For example, the fact that the 
Thue-Morse morphism ([1]) is cubefree shows the existence of cubefree words of arbitrary length in 
two letters. To prove that the entropy is actually positive, one has to show that the number of fc- 
powerfree words grows exponentially with their length. Essentially, this is achieved by considering 
fc-powerfree morphisms from a larger alphabet. The following theorem is a generalisation of 
Brandenburg's method, compare [11] . and provides a path to produce lower bounds for the entropy 
of fc-powerfree words. 

Theorem 13. Let A and B be alphabets with Card(yl) — r Card(y8), where r > 1 is an integer. 
If there exists an £-uniform k-powerfree morphism g: A* — > B* , then 

h{Fi^HB))^f^^. 

Proof. For this proof define h h(^F^''\B)'^ , c{n) :— Cp(k^0-j{n) and s := Card(S). Label the ele- 
ments of A as {flu, . . . , air, 021, • ■ • , a2r, • • ■ 7 isi, • ■ • , ^sr} and the elements oi B as {bi, ... ,bs}. De- 
fine the map (p: A* B* a.s 4>{aij) := fei for i = 1, . . . , s and j — 1, . . . ,r. Hence Card((/)"^(6i)) ~ r. 
Every fc-powerfree word of length m over B has r™ different preimages of which, by construction, 
consist only of fc-powerfree words. These words are mapped by g, which is injective due to its 
fc-powerfreeness, to different fc-powerfree words of length mi over B. This implies the inequality 

c{mi) ^ r'^cim) (3) 
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for any to > 0. This means 



( 





^ logr 



to£ to 
for any to > 0. Taking the limit as m — + cx3 gives 



{e-l)h ^ logr, 



thus establishing the lower bound. 



□ 



This result means that, whenever we can find a uniform fc-powerfree morphism from a suffi- 
ciently large alphabet, it provides a lower bound for the entropy. Clearly, the larger r and the 
smaller £ the better the bound, so one is particularly interested in uniform /c-powerfree morphisms 
from large alphabets of minimal length. 

Another method due to Brinkhuis [12j. which is related to Brandenburg's method, can be 
generalised as follows. Let again B — {bi, . . . , bg} be an alphabet and r G N. For i = 1, . . . s let 



with Ui_j C Fj,^'{B), where the latter denotes the words in F'^^^B) which have length £. The set 
U = {Ui, . . -Un} is called an (k, £, r)-Brinkhuis-set if the ^-uniform substitution (in the context of 
formal language theory) (j) from B* to itself defined by 



has the property (piF'-'^^B)) C In other words U is an (fc, £, r)-Brinkhuis-set if the 

substitution of every letter bi, occurring in a fc-powerfree word, by an element oi Ui results in a 
fc-powerfree word over B. The existence of a (fc, r)-Brinkhuis-set delivers the lower bound 



because every fc-powerfree word of length to is mapped to r™ powerfree words of length im; 
compare Eq. ([3]). 

The method of Brinkhuis is stronger than the method of Brandenburg. Not every (fc, £, r)- 
Brinkhuis-set implies a map according to Theorem ll3[ see [42[ p. 287] for an example. Conversely, 
if there exists an ^-uniform fc-powerfree morphism g : A* — > B* according to Theorem 1131 then 
there exists a (fc,^, r)-Brinkhuis-set, namely Ui = {g{aii), . . . , Q{air)} for i = 1, . . . , s, with the 
notation of Theorem [T3l 

Brinkhuis' method was applied in Refs. [43l [211 Hi] ! see also below for a summary of bounds 
obtained for binary cubefree and ternary squarefree words. These bounds have in common that 
they are nowhere near the actual value of the entropy, and while a systematic improvement is 
possible by increasing the value of r in Theorem [13] (which, however, also means that one has to 
consider larger values of £), it will always result in a much smaller growth rate, because only a 
subset of words is obtained in this way. 

Recently, a different approach has been proposed [2^, based essentially on the derivation of 
an inequality 



for the weighted sum Sm (n-) of the number of elements in a certain subset (which depends on the 
choice of to S N) of squarefree (resp. cubefree) words of length n over a ternary (resp. binary) 
alphabet and a parameter a > 1 which satisfies two inequalities for i — to, to + 1, . . . , n — 1. 
The estimation of Sm{n + 1) starts from a Perron- Frobenius argument and concludes with the 



U, C^^,2,...,C/.,.} 



4>:bii-^Ui for i — 1, . . . , s 



h{F^^\B)) ^ 



logr 

£-1 



S,n{n + 1) ^ aSmin) 



(4) 
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observation that the order of growth of the number of squarefree (resp. cubefree) words cannot be 
less than the order of growth of Sm{n), which is a. This imphes 

h{F'^''\A)) > log(a) 

ioi k — 2 and A = {0, 1, 2} or fc = 3 and A = {0, 1}, with the corresponding values for a. In the 
end, this method leads to a recipe to check, with a computer, several conditions for the parameters 
(including to), which ensure that the inequality for Sm holds. By increasing the parameter m, 
it appears to be possible to estimate the growth rate of cubefree and squarefree words with an 
arbitrary precision. For details, we refer the reader to Ref. |29]. 

5 Bounds on the entropy of binary cubefree and ternary 
squarefree words 

We now consider the two main examples, binary cubefree and ternary squarefree words, in more 
detail, reviewing the bounds derived by the various approaches mentioned above. We start with 
the discussion of binary cubefree words, and then give a brief summary of the analogous results 
for ternary squarefree words. 

5.1 Binary cubefree words 

Define for this section b(n) :— Cp,^^^^^ i})(") number of binary cubefree words of length n 

and h :— /i(i^'^'^^({0, 1})) as the entropy of cubefree words over the alphabet {0, 1}. The values 
for b{n) with n ^ 47 are given in [45]; an extended list for n ^ 80 is shown in Table [TJ They were 
obtained by a straight-forward iterative construction of cubefree words, appending a single letter 
at a time. According to Eq. ([5]), the corresponding upper limit for the entropy h is 

, ^ ^ 0.389855. 

80 

For comparison, the limit obtained using the number of words of length 79 is 0.390020, which 
indicates that these limits are still considerably larger than the actual value of h. As in the case of 
ternary squarefree words 26J, the asymptotic behaviour of b(n) fits a simple form b{n) ~ Ax~"' as 
n — > oo, pointing at a simple pole as the dominating singularity of the corresponding generating 
function at x = x^- The estimated values of the coefficients are A ~ 2.847 and x^. ~ 1.4575773, 
leading to a numerical estimate oi h = log(a;c) ~ 0.3767757 for the entropy. 

Let us compare this with the upper limit derived from generating function of the number of 
binary length-p cubefree words. To this end, let bp{n) := c^(3.p)({q denote the number of 

length-p cubefree words, and define 

oo 

Bp(a;) = ^6p(n)a;" (5) 

n=0 

to be the generating function for the number of binary length-p cubefree words. These functions 
of X are rational [4lj . The first few generating functions read 

B^ix) = ^ l + 2x + 4x2 + 8x^ + 16x^ + 320;^ + 64x^ + ..., 

B^{x) = Itltll = l + 2x + 4:x^ + 6x^ + 10x* + 16x^ + 26x'^ + 

B^{x) = ^+2"+^f,tS';4'+,'if°+2"" = 1 + 2x + 4.t2 + 6.t3 + lOx* + 16^5 + 24^6 + .. . 

The degrees of the numerator and denominator polynomials for p ^ 14 are given in Table O The 
generating functions Bp{x) have a finite radius of convergence, determined by the location of the 
zero x^ of its denominator polynomial which lies closest to the origin. A plot of the location of 
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Table 1: The number b{n) of binary cube- free words of length n for n ^ 80. 



n 


b{n) 


n 


b{n) 


n 


b{n) 


n 


b{n) 


1 


2 


21 


7754 


41 


14565048 


61 


27286212876 


2 


4 


22 


11320 


42 


21229606 


62 


39771765144 


3 


6 


23 


16502 


43 


30943516 


63 


57970429078 


4 


10 


24 


24054 


44 


45102942 


64 


84496383550 


5 


16 


25 


35058 


45 


65741224 


65 


123160009324 


6 


24 


26 


51144 


46 


95822908 


66 


179515213688 


7 


36 


27 


74540 


47 


139669094 


67 


261657313212 


8 


56 


28 


108664 


48 


203577756 


68 


381385767316 


y 


on 

oO 






49 


296 (31624 


d9 


555<599z36430 


10 


118 


30 


230800 


50 


432509818 


70 


810266077890 


11 


174 


31 


336480 


51 


630416412 


71 


1181025420772 


12 


254 


32 


490458 


52 


918879170 


72 


1721435861086 


13 


378 


33 


714856 


53 


1339338164 


73 


2509125828902 


14 


554 


34 


1041910 


54 


1952190408 


74 


3657244826158 


15 


802 


35 


1518840 


55 


2845468908 


75 


5330716904964 


16 


1168 


36 


2213868 


56 


4147490274 


76 


7769931925578 


17 


1716 


37 


3226896 


57 


6045283704 


77 


11325276352154 


18 


2502 


38 


4703372 


58 


8811472958 


78 


16507465616784 


19 


3650 


39 


6855388 


59 


12843405058 


79 


24060906866922 


20 


5324 


40 


9992596 


60 


18720255398 


80 


35070631260904 



poles of Bi^(x) is shown in Figure [TJ It very much resembles the analogous distribution for ternary 
squarefree words [26j ; again, the poles seem to accumulate, with increasing p, on or near the unit 
circle, which may indicate the presence of a natural boundary beyond which the generating function 
for cubefree binary words (corresponding to taking p oo) cannot be analytically continued; see 
[26j for a discussion of this phenomenon in the case of ternary squarefree words. 

As a consequence of Pringsheim's theorem 46, Sec. 7.2], there is a dominant singularity on the 
positive real axis; we denote the position of the singularity by x^. For the cases we considered, 
this simple pole appears to be the only dominant singularity. Since the radius of convergence of 
the power series Bp{x) is given by (limsup„^oo ^Jbp{n) ) ^ , the entropy hp of the set of binary 

length-p cubefree words is hp — — loga^c- Clearly, hp ^ hp> for p ^ p' , and h — limp^co hp, so for 
any finite p the entropy hp provides an upper bound of the entropy h of binary cubefree words. The 
values of the entropy hp for p ^ 14 are given in Table [H As was observed for ternary squarefree 
words [26| . the values appear to converge very quickly with increasing p, but it is difficult to 
extract a reliable estimate of the true value of the entropy without making assumptions on the 
asymptotic behaviour. 

Already in 1983, Brandenburg fTT] showed that 

2 • 2t <^ b{n) < 2 • 1251T? 

which leads in our setting to 0.07701 ^ ^ 0.41952. The currently best upper bounds are 
due to Edlin [45] and Ochem and Reix [27 . Analysing length-15 cubefree words up to a finite 
length, Edlin [45] arrives at the bound of h ^ 0.376777 (which is what we would expect to find 
if we extended Tabled to n = 15, but this would require huge computational effort to compute 
the corresponding generating function completely), while using the transfer matrix (or cluster) 
approach described above, Ochem and Reix obtained an upper bound on the growth rate of 
1.45758131, which corresponds to the bound 

h 0.3767784 



9 



Table 2: The entropy hp of binary Icngth-p cubefree words, obtained from the radius of convergence 
of the generating functions Bp{x) of Eq. Here, c?num and ddcn denote the degree of the 

polynomial in the mmierator and denominator of Bp{x), respectively. 



p 


^num 


^dcn 


hp 








1 


0.693147 


1 


2 


2 


0.481212 


2 


6 


5 


0.427982 


3 


21 


13 


0.394948 


4 


29 


17 


0.385103 


5 


43 


25 


0.380594 


6 


85 


57 


0.378213 


7 


127 


99 


0.377332 


8 


165 


127 


0.377179 


9 


300 


254 


0.376890 


10 


450 


395 


0.376835 


11 


569 


513 


0.376811 


12 


1098 


1031 


0.376790 


13 


1750 


1656 


0.376783 


14 


2627 


2540 


0.376779 



on the entropy. 

We now move on to the lower bound and cubefree morphisms. We already have seen one 
example above, the Thue-Morse morphism, which is a cubefree morphism from a binary alphabet 
to a binary alphabet. As explained above, it is also useful to find uniform cubefree morphisms from 
larger alphabets, because these provide lower bounds on the entropy. Clearly, if we have a uniform 
cubefree morphism g: A* ^ {0, 1}* of length £, with Card(yl) = r, it is completely specified by 
the r words lUj, i = 1, . . . ,r, which are the images of the letters in A. Since any permutation of 
the letters in A will again yield a uniform cubefree morphism, the set {wi, . . . ,i(7^} C {0, 1}'' of 
generating words determines the morphism up to permutation of the letters in A. 

Moreover, the set {Wj", . . . , wv} j where W denotes the image of w under the permutation 1, 
also defines cubefree morphisms, as does {wi, ■ ■ ■ where w denotes the reversal of w, i.e., 

the words w read backwards. This is obvious because the test-sets of Theorem [9] are invariant 
under these operations. Unless the words are palindromic (which means that w — w), the set 
{ui^, . . . , w^} thus represents four different morphisms (not taking into account permutation of 
letters in A), the forth obtained by performing both operations, yielding {W[, . . . ,W^}. 

For cubefree morphisms from a three-letter alphabet A to two letters one needs words of 
length at least six. For length six, there are twelve in-equivalent (with respect to the permutation 
of letters in A) cubefree morphisms. The corresponding sets of generating words are 

{Wi,W2,W4}, {W2,W^,W^}, {W2,W^,W^}, 

and the corresponding images under the two operations explained above. Here, the four words are 

wi= 001011, W2 = 001101, W3 = 010110, W4 = 011001. 

It turns out that none of these morphisms actually satisfy the sufficient criterion of Theorem [3 
but cubefreeness was verified using the test-set of Theorem |9l 

One has to go to length nine to find cubefree morphisms from four to two letters. There are 
16 in-equivalent morphisms with respect to permutations of the four letters. Explicitly, they are 
given by the generating sets 

{wi,W2,W2,W3}, {wa,Wg,W7,W9}, {w5,W5,Ws,Ws}, {w5 ,wE, Ws ,Ws} , {we, W7 ,Ws, Wg} (6) 
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Figure 1: Location of poles of the generating function B^^{x). 



n — \ — 1 — \ — \ — \ — \ — 1 — \ — \ — \ — \ — 


— \ — \ — \ — \ — 1 — \ — 
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^ 




'■%. - 

I 1^^ — 


%: 

- \ 




■ - 


' 





' 



-1 -0.5 0.5 ,r, 1 



with words 

wi = 001001101, W2 = 001010011, W3 = 001011001, 

W4 = 001101001, W5 = 010010110, W6 = 010011010, 

W7 = 010100110, W8 = 011001001, wg = 011010110. 

Note that Wq = Wg i& a, pahndrome, and that two of the five sets are invariant under the permu- 
tation 0^1, which explains why they only represent 16 different morphisms. 

Beyond four letters, the test-set of Theorem [S] becomes unwieldy, but the sufhcient criterion of 
Theorem[7]can be used to obtain morphisms. However, these may not have the optimal length, as 
the examples here show - again for length nine all morphisms violate the conditions of Theorem [71 
Still, this need not be the case; for instance, morphisms from a five-letter alphabet that satisfy 
the sufficient criterion exist for length 12, which in this case is the optimal length. 

As a consequence of Theorem [131 the morphisms ([6]) from a four letter alphabet show that the 
entropy of cubefree binary words is positive, and that 

h > ~ 0.08664. 

8 

Using the sufhcient condition, this bound can be improved. For instance, for length 15, one can 
find cubefree morphisms from 10 letters, which yields a lower bound of 

log 5 

h > ~ 0.11496. 
14 

However, a large step to close the gap between these lower bounds and the upper bound was 
achieved by the work of Kolpakov [55] . With his approach, a lower bound of 

h ^ 0.37676, 

which is the best lower bound so far, has been established. The difference between this bound and 
the upper bound 0.3767784 by Ochem and Reix [27] is just 10~^, showing the huge improvement 
over the previously available estimates. 
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5.2 Ternary squarefree words 



Denote by a{n) :— c^(2)j^|-q 2}) (^) number of ternary squarefree words and by ap(n) the 
number of length-p squarefree words of length n. For this section let h := /i(F(^)({0, 1, 2})) be 
the entropy of squarefree words over the alphabet {0, 1,2}. See [39] for a list of a{n) for n ^ 90 
and [21] for 91 ^ rt ^ 110. The generating functions are defined according to the binary cubefree 
case. The first four of them are stated in ^61 Sec. 3], which also contains a list of their radii of 
convergence for p ^ 24. Already in 1983 Brandenburg [TI] showed that 

6 • 2^2 s$ a{n) 6 • 1172^^ 

which leads in our setting to 

0.03151 /i sC 0.32120. 

In 1999, Noonan and Zeilberger [41] lowered the upper bound to 0.26391 by means of generating 
functions for the number of words avoiding squares of up to length 23. Grimm and Richard [26j 
used the same method to improve the upper bound to 0.263855. At the moment, the best known 
upper bound is 0.263740 which was established by Ochem in 2006 using an approach based on the 
transfer matrix (or cluster) method, see [57] for details. 

In 1998, Zeilberger showed that a Brinkhuis pair of length 18 exists, which by Theorem [T3| 
implies that the entropy is bounded by /i ^ log(2)/17 ~ 0.04077 [47]. By going to larger alphabets, 
this was subsequently improved to ft, ^ log(65)/40 ~ 0.10436 by Grimm [21] and h ^ log(110)/42 ~ 
0.11192 by Sun [H]. Again, the recent work of Kolpakov [29 has made a large difference to the 
lower bounds; he achieved the best current lower bound which is ft > 0.26369. The difference 
between the best known upper and lower bound is now just 5 x 10^^. 



6 Letter frequencies 

For a finite word w of length n, the frequency of the letter a is #a{w)/n € [0, 1], where #a(w) 
denotes the number of occurrences of the letter a in w. In general, infinite /c-powerfree words 
need not have well-defined letter frequencies. However, we can define upper and lower frequencies 

fa ^ /cT of ^ letter a & A a, word w £ A* a.s 

r+ y #a{Wn) r- 

:= sup limsup , := mi iimmi , 

{w„} n—oo n n^oo n 

where w„ is a n-letter subword of w. Here, we take the supremum and infimum over all se- 
quences {wn\- Alternatively, we can compute these frequencies from = maxu,^ci« H^a{wn) and 

— min^uj^cuj 4l^a{wn) by = lim„^oo o-nl'"^- The limits exist due to the subadditivity of the 
sequences {a+} and {1 — a~}. If the infinite word w is such that = =: /a, we call fa the 
frequency of the letter a in w. 

The requirement that a word is fc-powerfree for some k restricts the possible letter frequen- 
cies. For instance, for cubefree binary words, there cannot be three consecutive zeros, and hence 
the frequency of the letter is certainly bounded from above by 2/3. Due to symmetry under 
permutation of letters, it is bounded from below by 1/3. In a similar way, considering maximum 
and minimum frequencies of letters in finite fc-powerfree words produces bounds on the possible 
(upper and lower) frequencies of letters in infinite words. It is of interest, for which frequency of 
a letter fc-powerfree words cease to exist, and how the entropy of fc-powerfree words depends on 
the letter frequency. To answer these questions, fc-powerfree morphisms are exploited once again, 
and in two ways. Firstly, the argument using frequencies in finite words only produces 'negative' 
results, in the sense that you can exclude the existence of fc-powerfree words for certain ranges 
of the frequency. To show that fc-powerfree words of a certain frequency actually exist, these are 
produced as fixed points of fc-powerfree morphisms. The letter frequency for an infinite word ob- 
tained as a fixed point of a morphism q on the alphabet A = {oi, 02, ... , am} is well-defined, and 
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obtained from the (statistically normalised) right Perron-Frobenius eigenvector of the associated 
m X m substitution matrix M with elements = ^^aifficij); see for instance [55^. For example, 
for the Thue-Morse morphism ^ , the substitution matrix is M = ( } } ) with Perron-Frobenius 
eigenvalue 2 and corresponding eigenvector (i, ^)"^, so both letters occur with frequency 1/2 in 
the infinite Thue-Morse word. 

To show that there exist exponentially many words with a given letter frequency, or, in other 
words, that the entropy of the set of fc-powerfree words with a given letter frequency is positive, 
a variant of Theorem [13] is used. 

Theorem 14. Let A — {an, . . . , ai^, 021, . . . , a2r, ■ • • , o-sii • • ■ , o,sr} cind B = {&i, . . . , 63} he alpha- 
bets with Card(^) = rs and Card(S) = s, where r, s > 1 are integers. Assume that there exists 
an £-uniform k-powerfree morphism g: A* — > B* with 

#fc£'(ay) = #bg{aij') 

for allbCzBjl^i^s and 1 ^ ^ r. Define the r x r matrix M with elements 

Mij = #h,gi(aji), 

and denote its right Perron-Frobenius eigenvector (with eigenvalue £) by [fi, . . . , fr)'^ , with sta- 
tistical normalisation fi -\- . . . + fr ~ 1. Then, the entropy h of the set of k-powerfree words in B 
with prescribed letter frequencies fi of bi, 1 ^ i ^r, is bounded by 

i-l 

Proof. The bound is the same as in Theorem 1131 and the statement thus follows by showing that 
the infinite words obtained from the uniform fc-powerfree morphism g have letter frequency given 

by fl,...,fr- 

We again introduce the morphism (f>: A* B* by 4>{aij) := bi for i = 1, . . . , s and = 1, . . . , r. 
Every fc-powerfree word of length m over B has r™ different preimages of (p which, by construction, 
consist only of fc-powerfree words. These words are mapped by g, which is injective due to 
its fc-powerfreeness, to different fc-powerfree words of length mt over B. Due to the condition 
i^bQ{o-ij) — ft^bQio-ij') on g, the letter statistics do not depend on the choice of the preimage 
under (j). The letter frequencies of words obtained by the procedure described in the proof of 
Theorem [13] are thus well defined, and given by the right Perron-Frobenius eigenvector of the rxr 
matrix M . □ 

Some results for binary cubefree words, as well as a discussion of the empirical frequency 
distribution of cubefree binary words obtained from the enumeration up to length 80, are detailed 
below. 



6.1 Binary cubefree words 

When counting the numbers b{n) of binary cubefree words of length n shown in Table [H we also 
counted the number 6(n, Uq) of words with ng occurrences of the letter 0. Clearly, these numbers 
satisfy 

n 

"0=0 

and n — n^) = b(n, Uq) as a consequence of the symmetry under permutation of letters. Their 
values for 71 = 80 are given in Table [3] 

Obviously, there are at least 32 and at most 48 occurrences of the letter in any cubefree 
binary word of length 80, so the frequency of a letter is bounded by 2/5 < /o ^ 3/5. A stronger 
bound has been obtained by Ochem [30j , who showed (amongst many results for a number of 
rational powers) that /o > — 0.40636, using a backtracking algorithm. 
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Table 3: The number of the binary cube- free words of length 80 with given excess e = tiq — 40 of 
the letter 0. 



b(80,40 + e) 






9502419002570 


1 


7575510051076 


2 


3805516412947 


3 


1172047753336 


4 


210113470848 


5 


20038955440 


6 


866998237 


7 


12460464 


8 


26819 


> 9 






One is interested to locate the minimum frequency /min, such that infinite cubefree words 
with frequency /o — /min exist, but not for any /o < /min- Clearly, the lower bound above is a 
lower bound for /min- In order to obtain an upper bound, we need to prove the existence of an 
infinite binary cubefree words of a given letter frequency. This is again done by using a cubefree 
morphism, which provides an infinite word with well-defined letter frequencies. For instance, 

^ 011011010110110011011010110 

1 ^ 011011010110110011010110110 

is a uniform morphism of length 27 with substitution matrix ( Jg }|), so the infinite fixed point 
word has letter frequencies /o = 57 and /i = . Hence we deduce that 

0.406360 ^ < /min < ^ 0.407407. 

zoo z / 

Using the data from our enumeration of binary cubefree words up to length 80, we can study 
the empirical distribution for small length, and try to conjecture the behaviour for large words. 
Figure [2] shows a plot of the normalised data 6(80,40 -I- e)/6(80) of Table [H compared with a 
Gaussian distribution, which appears to fit the data very well. Here, the Gaussian profile was 
determined from the variance of the data points, which is approximately ~ 2.12A. 

To draw any conclusions on the limit of large word length, we need to consider the scaling of 
the distribution with the word length n. The first step is to determine how the variance scales 
with n. A plot of the numerical data is given in Figure [31 which shows that, for large n, the 
variance appears to scale linearly with n. A least squares fit to the data points for 40 ^ n ^ 80 
gives a slope of 0.021616. 

Assuming that the distribution for fixed n is Gaussian, the suitably re-scaled data 

gnix) = y/n T-— , 

b[n) 

considered as a function of the rescaled letter excess 



e 




should approach a Gaussian distribution 
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Figure 2: Distribution of cubefree words of length 80 as a function of the excess e = uq — 40, 
compared to a Gaussian distribution with the same variance a^. 

|&(80,no)/&(80) 




Figure 3: Variance of the distribution of the letter frequency in binary cubefree words of length n. 




1- 



10 20 30 40 50 60 70 80 U 

with variance cr^ ~ 0.021616. Figure |4] shows a plot of this distribution, together with the data 
points obtained for 40 ^ n ^ 80. Clearly, there are some deviations, which has to be expected 
due to the fact that the relationship between the variance and the length shown in Figure [3l while 
being asymptotically linear, is not a proportionality; however, the overall agreement is reasonable. 
A plausible conjecture, therefore, is that the scaled distribution becomes Gaussian in the limit 
of large word length. In terms of the entropy, the observed concentration property is consistent 
with the entropy maximum occurring at letter frequency 1/2, and a lower entropy for other letter 
frequencies. This is similar to the observed and conjectured behaviour for ternary squarefree words 
in Ref. 

By an application of Theorem ll4l the cubefree morphisms of Eq. ^ show that the entropy for 
the case of letter frequency /o = /i = 1/2 is positive. More interesting in the case of non-equal 
letter frequencies. As an example, consider the 13-uniform morphism 

an 1-^ 0010010110011 
ai2 0010011010011 
021 0010110010011 
>-> 0100101001011, 

where all words on the right-hand side comprise seven letters and six letters 1. One can check 
that this morphism satisfies the criterion of Theorem [SJ hence is cubefree. Consequently, the 
matrix M of Theorem [14] is M = ( g g ) , and the letter frequencies of any word constructed by 
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Figure 4: The scaled data 5„(a;) for lengths 40 ^ n < 80, compared to a Gaussian distribution 




this morphism are /o = 7/13 and /i = 6/13. Hence, the set of binary cubefree words with letter 
frequencies /o = 7/13 and /i = 6/13 has positive entropy bounded hy h ^ ^log2 ~ 0.115525 
(and, by symmetry, this also holds for /o = 6/13 and /i — 7/13). Again, like in the case of 
ternary squarefree words discussed in Ref. i26j, it is plausible to conjecture that the entropy is 
positive on an entire interval of letter frequencies around 1/2 (where it is maximal), presumably 

on (/mill, 1 - /min). 



6.2 Ternary squarefree words 

Letter frequencies in ternary squarefree words were first studied by Tarannikov 49J. He showed 
that the minimal letter frequency /min is bounded by 

see [49l Thm. 4.2]. These bounds have recently been improved by Ochem [30] to 

1 000 RR'-^ 

0.2746498 ~ /min ^ 0.2746501, 

3641 3215 

who also showed that the maximum frequency /max of a letter in a ternary squarefree word is 
bounded by 

255 

/max ^ - 0.390505; 
653 

see |30[ Thm. 1]. Very recently, Khalyavin [33 proved that the minimum frequency is indeed equal 
to Ochem's upper bound, so 

- _883_ 
3215' 

which finally settles this question. 

By constructing suitable squarefree morphisms in accordance with Theorem 1141 Richard and 
Grimm '26] showed that, for a number of letter frequencies, the number of ternary squarefree 
words grows exponentially. This has recently been further investigated by Ochem [32] . 
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7 Summary and Outlook 



In this paper, we reviewed recent progress on the combinatorics of fc-powerfree words, with par- 
ticular emphasis on the examples of binary cubefree and ternary squarefree words, which have 
attracted most attention over the years. Recent work in this area, using extensive computer 
searches, but also new methods, has led to a drastic improvement of the known bounds for the 
entropy of these sets. No analytic expression for the entropy is known to date, and the results on 
the generating function for the sets of length-p powerfree words indicate that this may be out of 
reach. However, considerable progress has been made on other combinatorial questions, such as 
letter frequencies, where again bounds have been improved, but eventually also a definite answer 
has emerged, in this case on the minimum letter frequency in ternary squarefree words. 

We also presented some new results on binary cubefree words, including an enumeration of the 
number of words and their letter frequencies for length up to 80. The empirical distribution of 
the number of words as a function of the excess of one letter is investigated, and conjectured to 
become Gaussian in the limit of infinite word lengths after suitable scaling. We also found bounds 
on the letter frequency in binary squarefree words, and show that exponentially many words with 
unequal letter frequency exist, like in the case of ternary squarefree words. The analysis of the 
generating functions of length-p binary cubefree words, which we calculated for p ^ 14, also shows 
striking similarity to the case of ternary squarefree words, suggesting that the observed behaviour 
may be generic for sets of fc-powerfree words. 

While a lot of progress has been made, there remain many open questions. For instance, is 
there an explanation for the observed accumulation of poles and zeros of the generating functions 
on or near the unit circle, and is it possible to prove what happens in the limit when p — > oo? 
How does the entropy depend on the power, say for binary fc-powerfree words? A partial answer 
to this question is given in Ref. [25^, but it would be nice to show that, at least in some region, 
the entropy increases by a finite amount at any rational value of fc, which you might expect to 
happen. Concerning powerfree words with given letter frequencies, how does the entropy vary as 
a function of the frequency? One might conjecture that the entropy changes continuously, but at 
present all we have are results that for some very specific frequencies, where powerfree morphisms 
have been found, the entropy is positive. Some of these questions may be too hard to hope for an 
answer in full generality, but the recent progress in the area shows that one should keep looking 
for alternative approaches which may succeed. 
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